Documenting the History of the Digital Age: 


What do we Know? 


Vinton G.Cerf and John C. Klensin 


MCI Communications Corporation 


Introduction 


This paper discusses in brief terms what we know about the history of the Digital Age and 
then explores how we know it and what impact the Internet, its World Wide Web, email, file 
repositories and other functions, may have on our knowledge of history and our ability to 
record, organize, and recall facts and events in that history. Any attempt to cover 
comprehensively what we know about the Digital Age would surely take many terabytes.’ At 
best, one might be able to try to select some key events in the history of the Digital Age and, 
even then, there are sure to be disputes over the choice of the events deemed to be seminal. 
Alternatively, one can consider the nature of what we know, without attempting to be 
comprehensive and specific about its substance. This brief paper timidly takes the latter 
approach 


Computer Evolution 


Strictly speaking, one might mark the beginning of the history of the Digital Age with the 
invention of various calculating engines attributable to legendary names such as Pascal, 
Liebniz and Babbage’. Indeed, one might even choose to go back some 5,000 years to the 
invention of the abacus. The stones or beads of this device might be said to represent the 
eolithic period in the history in digital computing. Pascal and Leibniz invented more 
advanced, but still mechanical calculating engines in the 17" Century, as did Babbage in the 
19". If we adopt the term eolithic for this early mechanical work, we might then speak of the 
work of Hollerith, Atanasoff, Von Neumann, Aiken, Eckert, Mauchly, Turing, Zuse'“, and 
many others of the first half of he 20" Century as the paleolithic era. With those examples as 
preamble, we might then mark the beginning of the neolithic with the invention of the silicon 
transistor in 1947 by Bardeen, Brattain and Shockley; the integrated circuit by Jack Kilby 
and Robert Noyce in 1958;’and the Intel 4004 microprocessor on a chip by Marcian Hoff in 
1969." Thedramatic evolution in storage media and costs have been nearly as important as 
those in processors. This story, too, starts with beads, but evolves along two paths: tubes and 
delay lines to the development of core memory and thence to high-density dynamic and static 
solid state memories and wire boards and paper tape through the high-density rotating devices 
and very high capacity tapes of today. If we were still faced with the storage costs and 
densities of the 1960s, we couldn’t have the web (or MSWord) today -- they are just too 
demanding in terms of storage utilization. 


To these early results, we can add the general evolution of large scale integration, reaching 
now the middle years of VLSI with the Intel Pentium MMX, Motorola 60X series PowerPC 
chips, MIPS 10000, and the special purpose chip sets used in high end super computers such 
as the various machines of the CRAY line. Some evolutionary branches, such as the once- 


promising Connection Machine series invented by Danny Hillis and built by Thinking 
Machines, Inc., seem to have died out. Departmental computers, the so-called mini-machines, 
such as the Digital PDP-11 and VAX series have prospered, although they are being 
displaced by high end workstations that have evolved into servers while workstations are 
being driven from below by high-end personal computers. Every generation of machine 
seems to drive out previous, higher-end generations, although it appears that no class of the 
basic taxonomy has been driven totally to extinction. Thus we still have supercomputers, 
mainframes, departmental servers, workstations, personal computers, palm tops and personal 
digital assistants. Certain species have definitely died, but classes seem to grow, at the lower 
end, as technology allows us to place processing where we wish it at lower and lower cost. 


But computers form only a part of the story. A similar evolution has taken place in other areas 
such as programming languages, operating system design, and computer networking. To 
these evolutionary vectors, we may add the general digitization of everything - voice 
telephony, audio recordings, video camcorders, clocks, and the controls of automobiles and 
an increasing number of household appliances. 


In the end, all of these manifestations of the Digital Age merely form the stage upon which a 
drama is playing out: the evolution of software applications. In all probability, there is no end 
to this play. On this stage, virtually anything that can be programmed is possible, and what 
limits are there to the creativity of the human mind? 


Programming Languages and Operating Systems 


The evolution of computers from discrete, reed relay and tube switches to transistors, to very 
large scale integration microprocessors is plainly an important part of what we know. The 
development of programming languages from assembly languages to FORTRAN, COBOL. 
ALGOL, PL/I PASCAL, SMALLTALK, C, C++, PERL and JAVA marks yet another 
important evolutionary dimension. Operating systems started with the early batch, stand-alone 
varieties, evolved into multi-tasking systems such as IBM’s OS/MFT, OS/MVT, and MVS, 
UNIVAC’s EXECS, and into interactive time-sharing systems, such as MIT’s CTSS, BBN’s 
TENEX, AT&T’s UNIX and Digital’s TOPS-10 and later VMS for the VAX series, Control 
Data’s COMPASS operating system were representative of a class of operating environments. 
The Multics operating system developed at MIT for a modified GE-645 machine deserves 
special mention for its hardware-assisted security capabilities and for being the first 
significant realization of a number of ideas that were strongly influential in the design of 
subsequent, and even contemporary, systems, . Time-shared systems were used extensively in 
networked environments, such as the ARPANET. Workstations, from such companies as the 
SUN Microsystems, Hewlett-Packard and Silicon Graphics, typically ran some variant of 
UNIX operating system, as did many Digital systems, in addition to Digital’s VMS. 


Oddly, with the arrival of personal computing in the late 1970s, users were returned to 
systems that were closer to their ancient, stand-alone cousins. Apple’s pre-Macintosh 
operating system essentially allowed the operation of one program at a time. The Macintosh 
MAC OS allowed more than one program to be initiated, but typically only one would be 
active at any one time. The same was true of Microsoft's MS/DOS and Windows operating 
systems. There were some concurrent I/O opportunities allowed, but it was not until UNIX 
was ported to PCs and Microsoft produced its NT system that one had multiprocessing 
capability on the desktop. 


Networking and Protocols 


The networking dimension starts with simple, remote stations, such as the IBM Job Entry 
Stations containing card readers, card punches and printers for the remote submission of 
batch jobs; for example, the use of 1401s and their peripherals as job stream I/O devices for 
the 709/709x under FMS and later IBSYS. As interactive time-sharing is explored, remote 
access on the Public Switched Telephone Network (PSTN) becomes possible at relatively low 
speeds (110 - 300 bits per second). These systems were all terminal-to-host designs, 
permitting a single system to serve multiple, remote users. In the mid-late 1960s, a new 
networking concept was explored in the US through the Defense Advanced Research Projects 
Agency’s ARPANET and in the UK at the National Physical Laboratory (NPL). Packet 
switching was the term used to describe this idea although it was not called that when it was 
first documented in the early 1960s in reports from RAND and from MIT. The project at NPL 
involved a single switch and the originator of the project called the switched units, packets. 
In some ways, the NPL project might be considered the first local area network. 


The first ARPANET nodes were developed by Bolt Beranek and Newman (BBN) and 
installed late in 1969 and early 1970 at UCLA, Stanford Research Institute (SRI), University 
of California at Santa Barbara (UCSB) and University of Utah. A major public demonstration 
of the ARPANET was conducted in October 1972 during the first International Conference 
on Computer Communication (ICCC) in Washington, DC. At this same conference, the 
International Network Working Group (INWG) was formed, as an analog to the Network 
Working Group (NWG) which developed the first ARPANET computer communication 
protocols. By the early 1970s, many other investigations of packet switching were under way, 
notably the Experimental Packet Switching System developed by the British Post Office, the 
CYCLADES project of the French Institute for Research on Informatics and Automation, the 
Reseau Communication par Paquet (RCP) project of the French telephone service and the 
Canadian DATAPAC project. In the US, around 1972, BBN started one of the first successful 
packet switching services, TELENET, but used a set of protocols that were very different 
from those in use on the ARPANET (see Protocol Wars, below). 


Computer communication protocols emerged as the central theme of the ARPANET effort 
and were developed in the context of interconnected time-sharing systems. The basic 
interface between the ARPANET nodes (called Interface Message Processors, or IMPs) was 
defined in a report from BBN: BBN1822. It became the bible for interconnecting computers 
on the network. But software was needed to allow computers of vastly different speeds, word 
sizes and formats to interwork. Standards had to be set to achieve commonality. Originating 
with a small group of graduate students at UCLA, University of Utah, UCSB and researchers 
at SRI, the ARPA Network Working Group tackled the host protocol problem, starting 
before the first equipment had been delivered. The result, motivated strongly by the 1972 
ICCC demonstrations, was a collection of layered protocols, with BBN1822 at the bottom, 
the Network Control Protocol (NCP) and Initial Connection Protocol (ICP) at next higher 
layer. Utilities and applications were layered above NCP and were developed by groups 
across the country involved in the host protocol work. Examples of this layer include, file 
transfer (FTP) and remote terminal access (TELNET). Electronic mail was, at first, an 
extension of FTP but later used its own Simple Mail Transfer Protocol (SMTPMany other 


protocols were developed, together with a number of applications in the ensuing 20 years 
before the ARPANET’s retirement in 1990. 


The initial ARPANET operated on 50 Kb/s circuits provided by AT&T. The network started 
with 4 nodes but grew quickly to its planned 19 nodes, and then expanded as success in 
military, academic and research communities drove increased demand. 


Commercial networking began to emerge in this same time period (late 1960s and early 
1970s). Proprietary work at Digital Equipment Corporation on DECNET, at IBM on Bisynch 
and later, Systems Network Architecture (SNA) and standards efforts on X.25 at the 
Consultative Committee on International Telegraphy and Telephony (CCITT) of the 
International Telecommunications Union (ITU) marked the commercial networking sector. 
Work at Xerox Palo Alto Research center on the Xerox Networking System (XNS), while not 
commercially successful, was taken up by Novell and transformed into Novell's Netware and 
IPX products which were very successful in the market until overtaken by newer, non- 
proprietary international standards. 


With the success of the ARPANET, ARPA began a series of projects to explore the use of 
packet switching in new media, specifically, shared mobile digital radio and shared satellite 
channels. These so-called multi-access channels had their origins in another ARPA 
sponsored project, the ALOHANET, at the University of Hawaii. The developers used old 
taxi radios, outfitted with digital logic to packetize data and to send it in bursts over the air to 
a central computer at the University of Hawaii in Honolulu. The central controller was called 
the Menehune (which was the Hawaiian word for “imp” Terminals with text to send would 
transmit their bursts whenever they had data ready. If they heard no reply from the 
Menehune, they would retransmit. The innovation was that all terminals used the same 
inbound radio channel. Collisions would result in garbling and the Menehune performed 
checksums to detect such damage. If a collision occurred, the Menehune would ignore the 
data. The terminal, hearing no . on a separate outbound channel from Menehune, would 
retransmit, but at a random time later, to avoid re-colliding. No wonder they called it the 
ALOHANET; it was rather relaxed about the way the terminals took turns. The inventor of 
the Ethernet tells of reading a report from this project in 1973, visiting the site, and returning 
to XEROX Palo Alto Research Center (PARC) to invent a new form of local area networking 
using ALOHANET multi-access protocols on a coaxial cable, adding a carrier sensing 
collision detection mechanism which minimized the effects of colliding packets. 


The Packet Radio and Packet Satellite programs extended multi-access technology in two 
ways. In the Packet Radio case, multi-hop services using a single, shared radio channel were 
offered in a dynamically-changing topology. In the case of Packet Satellite, two channels 
were used, on for uplink and one for downlink, so that the terminals could simultaneously 
send and receive. Packet Satellite propagation delays were relatively long since it takes at 
least a quarter second for a transmission to propagate from the ground to a synchronous 
satellite and back to Earth again. These two research networks explored dynamic topology 
packet switched networking and long delay networking in multi-access environments. Not 
surprisingly, these networks were very different from the ARPANET and the Ethernet. Their 
packet sizes were different; their error rates and speeds were different. ARPANET ran at 50 
Kb/s using 128 byte packets. The Packet Radio system used 100 Kb/s and 400 Kb/s modes 
with packet sizes on the order of 255 bytes. The Packet Satellite system used 64 Kb/s 
channels and packet sizes up to about 512 bytes. 


For lack of time and space, I have left out the important story of ring-based networks such as 
the Distributed Computing System (DCS) from University of California, Irvine; the PRONET 
from Proteon, based on work at MIT based on DCS; and Fiber Distributed Data Interface 
(FDDI) rings. And I have left out the fascinating hisiories of Asynchronous Transfer Mode 
(ATM) switching, Frame Relay, and recent emergence of IP switching. A comprehensive 
discussion would surely include these and many other important networking developments 


that took place in the last quarter of the 20" Century. and the last minutes, figuratively 
speaking, of the Second Millennium. 


INTERNET 


At the time that the Packet Radio and Packet Satellite experiments were underway, it was 
logical to ask how these different packet networks could be interconnected. Out of the 
question came the ARPA internetting project’. The objective was to develop a set of 
protocols which would permit transparent networking across a collection of inter-linked 
packet networks. A number of principles guided the design: 


1. Each network was to remain independent and unmodified 

2. The interconnecting black boxes (later called gateways and, still later, routers) should 
retain as little state information as possible 

The networks would not be relied upon for reliability. This had to be achieved on an end- 
to end basis. 

4. There would be no central, global control 


Les) 


The focus of the research for what is now called the Internet was on the protocols to be used 
by the hosts on an end-to-end basis and the protocols to be used in the routers to interlink the 
various networks to each other. It was recognized almost immediately that a new set of end- 
to-end protocols to replace the older ARPANET Network Control Protocol (NCP) and Initial 
Connection Protocol (ICP) would be needed to achieve better performance and end-to-end 
reliability. The older protocols had depended on the reliability of the ARPANET and its 
ability to sequence traffic to achieve reliable, ordered communication. Moreover, it was clear 
that the gateways (routers) would need some way to communicate topological information to 
route packets from their sources to their destinations. 


In the initial implementation, a single Transmission Control Protocol (TCP) was specified 
which provided reliable, sequenced, flow-controlled and error-free communication on an end- 
to-end basis. The gateways were aware of the format of TCP packets and used their headers 
to make routing decisions. After two iterations on the design, and some extensive testing, it 
was concluded that to support real-time applications that did not require sequencing as much 
as low delay, the TCP protocol would be split into an Internet Protocol (IP) and a residual 
TCP which only concemed itself with end-to-end functions (flow control, sequencing, error 
recovery and multiplexing). 


The idea of layered protocols was lifted from the ARPANET experience and used 
extensively in the Internet design. The various utility and application protocols for remote 
access (TELNET), file transfer (FTP) and electronic messaging (SMTP) were taken into the 
Internet system almost without modification. 

\ 
After some four iterations on design, implementation and testing, with an international 
community of interest (using the INWG formed at ICCC in 1972 and specific, ARPA-funded 
research groups in the US and Europe), the Internet Protocols stabilized in their fourth version 
by about 1978. Documentation, testing and wider implementation occupied the next 4 years 
until January, 1983, when it was decreed that all ARPANET hosts would migrate to the new 
TCP/IP protocol suite. For many, January 1983 marks the birth of the operational Internet. 


By the mid-1980s, several important events influenced the future course of the Internet. 
Commercial routers were offered by companies such as Cisco Systems and Proteon. The 
National Science Foundation (NSF) initiated its NSFNET project which led to the creation of 


a number of intermediate level networks to provide connectivity between university and 
research sites and the NSFNET backbone. And interest in the Internet began to emerge more 
fully in the European community as well as the United States with the commercial availability 
of local area networks, UNIX systems with TCP/IP built-in, and routers from commercial 
sources. 


By the late 1980s, US Government policy was shifting to allow limited commercial access to 
the Internet, particularly with its decision to allow a commercial email service, MCI Mail, to 
interconnect with the Internet in 1989. Appropriate Use Policy (AUP) for the Internet 
continued to evolve until the retirement of the NSFNET in April, 1995, at which point all 
restrictions were removed on the general use of the Internet. As early as 1990/1991, 
commercial Internet services were emerging, most notably Alternet operated by UUNET and 
PSINET, both of which began as non-profit organizations serving an academic and research 
communities. The NSFNET AUP restrictions drove commercial service providers to find 
alternatives to interconnection via NSFNET and a number of them banded together to form a 
Commercial Internet eXchange (CIX) to avoid usage limitations. CIX became a kind of 
model for the later NSF-sponsored Network Access Points (NAPs) which served as 
unrestricted packet exchange points in the post-NSFNET environment 


Other networking activities of this era deserve more than their brief mention here, notably 
USENET, based on the UNIX-to-UNIX Copy Program (UUCP); BITNET, developed 
initially to link academic computing centers using IBM mainframes; FIDONET, developed to 
interlink collections of personal computers through dial-up, store-and-forward protocols. 


Protocol Wars 


It would be inaccurate not to mention, at least, the lengthy period from about 1978 to 1992, 
during which considerable debate surrounded two rival non-proprietary standards for 
computer communication, the TCP/IP protocols developed for the Intemet and the so-called 
Open Systems Interconnection (OSI) standards developed in the International Organization 
for Standardization (ISO) and the CCITT The original documentation for the OSI reference 
model emerged in an architecture paper in 1978. There ensued many years of detailed 
specification work on a 7-layer architecture, along with a unique vocabulary of terms to 
describe the functionality of the proposed networking protocol suite. 


The origins of this debate were found in the X.25 virtual circuit standards developed in the 
CCITT starting in 1973 and reaching first standardization in 1975. In this model, reliability 
and sequencing would be achieved in the network and not in the end machines. At the time, 
this was an understandable commercial motivation because most computer networking was 
based on dedicated, leased circuits. The commercial providers of computer communication 
network service wanted to emulate as closely as possible the existing circuit services and, 
thus, developed a virtual circuit design. The Internet design, in contrast, assumed that the 
collection of network technologies making up the Internet could not be relied upon to provide 
sequenced, reliable delivery and opted for a datagram mode of operation. All reliability and 
sequencing and error correction would take place on an end-to-end basis. The 
datagram/virtual circuit wars went on for years. X.25 became a very successful service but 
didn’t work well in local area network environments. As LANs became more and more 
prevalent, the Internet datagram mode of operation was increasingly attractive. An effort was 
made to include a datagram mode in X.25 but was abandoned after it appeared to be far too 
complex to be implemented at reasonable cost. 


The OSI protocols were initially based on the virtual circuit assumptions of X.25 and built on 
top of them. An attempt was made to integrate a datagram mode of operation (the so-called 


connectionless mode) but this came fairly late in the evolution of the OSI protocols and was 
never widely implemented. The connectionless network protocol (CLNP), was actually 
implemented in the NSFNET and used in very limited quantities, mostly during 
experimentation with higher-level OSI protocol implementations. In practice, there was only 
modest implementation of the OSI protocol suite, but this did not stop many governments, 
including the United States, from officially adopting the OSI protocol suite. Despite the 
formal adoption, however, the commercial world continued to build and sell Internet-based 
products until, finally, around 1992, it became clear that little of the OSI effort had reached 
commercial viability. Among the long list of OSI protocols specified, those involving 
messaging seemed to achieve the most penetration in the marketplace. The X.400 messaging 
standards achieved prominence and were used to interconnect public electronic mail systems. 
The X.500 directory standards had somewhat less success but exist in a number of products 
and services. They have, however, provided a model for other systems, including many 
LAN-based mail systems that use neither X.400 nor Internet technology, but that have 
borrowed heavily from X.400 terminology and basic design ideas. Interworking of X.400 
and Internet-based electronic messaging systems, together with proprietary LAN-based email 
services is now commonplace, though by no means perfect. The semantics of the various 
systems continue to interfere with satisfactory end-to-end transparency, and much effort 
continues to be expended on achieving better results. 


In the meantime, the Internet standards for messaging, SMTP (RFC821) and message body 
formats (RFC822) together with the newer Multi-purpose Internet Messaging Extensions 
(MIME) and negotiated SMTP extension system, form the basis for an increasingly 
widespread messaging infrastructure. Proprietary protocols in LAN environments continue to 
be a major factor in corporate use of messaging, but these are increasingly forced by necessity 
to interwork with Internet Standards for the benefit of users. 


It is tempting to continue to try to document more of the Digital Age in this paper, but the 
task is boundless and there are a few other topics that deserve attention. 


How Do we Know the History of the Digital Age? 


In some sense, we only what people take the time to organize and record. We often fail to 
record what later turns out to be important. Few, if any, remember the details of the first 
transmissions on the ARPANET. Should we archive everything in the hope that, later, we will 
have important facts available when they prove to be important (or at least interesting to 
know)? Many historians would appreciate a full record of everything that happens, for 
reference purposes. Software people never got into the habit of using lab notebooks on a 
regular basis, unlike their hardware design counterpartsAs a consequence, much is not known 
in any shared form of the blind alleys and blinding inspirations of the software age. Popular 
accounts get at part of it, but these are often based on recollection rather than documentation. 
Even remembering WHEN something happens is difficult if not written down in an archival 
form. 


The early history of the ARPANET and Internet host level protocol design is captured in an 
extraordinary series of documents labelled Request for Comment (RFC The first RFC was 
issued in the spring of 1969, starting the series. The series has been edited by one person who, 
to this day, continues to make editing this series a part of his career. These have become the 
official documents of the Internet Standards community, though they began as a series of 
informal communications among the developers of the ARPANET and, later, Internet 
Protocols. Comparison of the earliest RFCs and the most recent reflect the radically-different 
treatment these documents got in 1970 versus 1997. Originally issued on paper and 
distributed and archived by Stanford Research Institute, these documents are now found 


online in many archives of the Internet. Interestingly, the Internet project started a series of 
notes, Internet Experiment Notes (IENs), in parallel with the RFCs. At the time, the RFCs 
were associaed with the continuing development of protocols on the ARPANET. The Internet 
work was experimental and it wasn’t clear how it would turn out. It seemed, at the time, 
inappropriate to distract the ARPANET community with the uncertainties of the Internet 
research program. When it became apparent that the Internet protocols really should replace 
the older ARPANET versions, the IEN series was ended and new Internet documents became 
a part of the RFC series. From this experience, we learned that it was preferable to maintain 
well a single series of notes with common indices and the like, than to have multiple note 


series and the corresponding problem of deciding where to put things or where to find things 
of interest. 


One important aspect of the early RFC series is that they conveyed the real debates and 
discussions of design choices and tradeoffs. They formed a record of the primary concepts 
explored in the new territory of packet switching. As email became an increasingly 
widespread and attractive medium of exchange, the RFCs were issued only to capture more 
polished results. Electronic mail archives, however, were established for a number of email 
distribution lists, such as the Header-People list whose members were interested in the 
evolution of electronic mail standards. Another electronic means of mediating discussion 
emerged from the so-called conferencing programs such as PLANET from the Institute for 
the Future. In this system, discussion was conducted in text form and each entry was 
cataloged and archived. An editor/moderator had the ability to edit the archive, and any 
participant had access to the historical record. Various ways of indexing the corpus of the 
archive were provided to help readers to find what they were interested in. Conferencing 
software has evolved but has retained much of the functionality of the early systems. 


Another format for information dissemination grew out of the Unix community. An informal 
network, USENET, of UNLX-based machines grew up during the late 1970s and early 1980s. 
Built on the Unix-to-Unix Copy Program (UUCP), the so-called NetNews or NewsGroups 
system was developed. Contributions from all interested parties were merged into feeds 
which were distributed throughout the USENET. Special NewsGroup reading programs (we 
would call them clients today) made it easy to sort through and select from the multi-topic, 
aggregate newsfeed. A major technical and sociological story in itself, the history of 
NewsGroups and USENET deserves a great deal more space than I am able to give it in this 
paper. 


None of the methods mentioned so far is perfect and each has different strengths. General 
email has the problem that it arrives in an undifferentiated avalanche of content that can be 
sorted, typically, by subject line or author or date/time of submission. Moreover, a series of 
messages bearing the same topic (replies on replies, etc.) often drifts away from the original 
topic, interfering with accurate archiving and indexing. NewsGroups have a similar, off-topic 
drift problem but if moderated, the moderator can exercise discipline, as can the moderator of 
a conferencing system or a moderated mailing list. New NewsGroup topics can be formed 
and new conference topics can be created. However, one is then confronted with the problem 
of choosing in which group to move forward a discussion, leading to cross-posting and a 
general blurring of the topicality of any particular NewsGroup or discussion group A 
number of efforts, notably those associated with Jacob Palme in Sweden, have attempted to 
merge the models of conferencing systems, newsgroups, and mailing lists into a single user 
environment. None of those efforts have been very successful in the marketplace, although 
attempts continue in both protocols and commercial products 


Attempts to organize email into topic folders usually suffer from the same disease as 
NewsGroups and conference topics. I find myself filing messages in multiple folders, largely 


in the hope that | will remember one of the folder/topic names when it comes time to retrieve 
the messages of interest. 


ARCHIVES OF INFORMATION 


As networking has spread, so have a variety of technologies for encoding, formatting, 
distributing and presenting information. File archives. containing programs and text files and 
data, were among the first to appear, thanks to the invention of anonymous FTP early in the 
ARPANET history. These repositories could be accessed by anyone logging in as user 
anonymous, password guest. It was not until the early 1990s, however, that the tens of 
thousands of file archives were indexed by a program called archie developed at McGill 
University. The system indexed roughly 80,000 FTP repositories to simplify network-wide 
searches for files of particular names. This idea was soon followed by Gopher, developed at 
the University of Minnesota, which created a distributed menu system. Each menu screen 
could come from a different processor on the Internet. Gopher eliminated the need to know 
much, if anything, about host names, IP addreses and so on. In the same general time period, 
the Wide Area Information Service (WAIS) was invented to index the full content of 
distributed archives of textual information. The World Wide Web (WWW) brought another 
dimension to the distributed archiving of information, this time in a multimedia setting. Web 
pages containing images, multifont text, audio and video clips and even executable programs 
added a richness to the content of the Internet that had not been seen before, except in 


independent pieces (separate files, etc.). 


It is the WWW which motivates some of the questions surrounding the capture of the history 
of the Digital Age. The other archives and media still persist, as does electronic mail which is, 
itself beginning to take on Web-like characteristics, thanks to improvements such as the 
Multipurpose Internet Mail Extension (MIME) standards developed by the Internet 
Engineering Task Force (IETF). The World Wide Web has unleashed an avalanche of 
multimedia information flowing into servers around the world. Its standard protocols such as 
the HyperText Transfer Protocol (HTTP) and HyperText Markup Language (HTML) 
have become a part of the current idiom. The Uniform Record Locators (URLs) that point 
to the locations of Web pages on the Internet (using Domain Names such as www.mci.com) 
are frequently displayed in print and television advertising, referenced on the radio and 


mentioned in normal, everyday discourse. 
World Wide Web (WWW) 


There are an estimated WWW 500,000 servers on the public Internet, among the estimated 
18-20 million computers on the system (not including all the dial-up desk and laptops that 
aren’t usually counted in surveys). Corporate private intranets probably account for 
anywhere from 4-10 times as many web servers as there are in the public Internet. As an 
increasing amount of business data and correspondence find their way into Internet-based 
media, the question of archiving, indexing and retrieving such information becomes 


increasingly important and relevant. 


It is possible to print many of the pages found in the WWW. Color is typically needed. If 
printing was a sufficient solution, one might be tempted to adapt the archiving and retrieving 
methods of the print medium to address the problem of archiving the Web. But printing is not 
sufficient. Nor is recording of audio and video clips into various magnetic media. The WWW 
pages contain a variety of media, encoded for computer interpretation and presentation. These 
various page components can be rendered in various ways, depending on the output medium. 


Some portions can, indeed, be printed. But some components require interpretation as 


sound, as moving images and even as programs. Moreover, keeping the information in its 
computer-interpretable form allows for future, computer-assisted searches to be made. 
Plainly, this is a significant challenge -the cataloging, indexing and archiving of a 
fundamentally new medium. Another aspect that makes printing inadequate --and makes 
historical/documentation work very hard-- is the essentially dynamic nature of some of the 
material. Many pages are useful because they change. sometimes rapidly, over time, and 
printing the material gives a snapshot rather than reflecting either the newest versions or the 


process/ sequence of changes 


The problems are not easily bounded. New encodings are invented regularly and must 


become a part of the archiving and indexing milieu. Since much of the interpretation is based 
on software, it is necessary to retain the software that can interpret the encoded content or to 
periodically recode the information so it can continue to be interpreted by software that is 
contemporaneously available. One of the most difficult aspects of archiving of computer- 
based content has proven to be the short-lived lifetimes of either the storage media themselves 
or the devices that could read and write them. Just try finding software and/or hardware to 
read an 8” floppy disk so popular with early word processing systems or a seven-track tape 
drive. A regular program of copying old media content into new content might help to 
combat some of these problems. Maintaining the utility of encoded information is going to be 
a major challenge of the Digital Age. The recent entry of the JAVA programming language 
into the Internet may well offer at least one common platform on which to build stability, 
even as new hardware platforms emerge from the rapid pace of semiconductor 
microprocessor evolution. JAVA interpreters can be built on virtually any computing ; 
platform, allowing JAVA-dependent content to be accessible on newer processing engines. 
But this also encourages more dynamic content and pages built up from multiple-origin 


sources, which may be much harder to capture 
ters of older machines on the newer ones so that 


ly a transitional step towards maintaining 
ng older but valuable information. 


An alternative thought is to produce interpre’ 
old software can still be run. Even if this is on 
accessibility, it has been a common technique for preservi 


Summary 


We know a good deal about the history of the Digital Age but our knowledge is becoming 
less complete as this age matures. We are not keeping up with the volume of material that it 
might be wise to archive and index. We do not have programs in place for capturing the 
often-volatile content of the World Wide Web and its Corporate counterparts (Company-wide 
web?). We have few tools for archiving and indexing personal, multimedia content, nor 
national programs for archiving of ephemeral information which could be invaluable to the 
historians of a future era. It seems inescapable that there are a wide range of information 
archiving and indexing products and services which would be of interest in all sectors 
(personal, business, academic and government) where Internet technology has taken root. 


That this will be a complex undertaking is surely an understatement. The preservation of 
privacy and corporate confidentiality will compete with future interest in this information. 
Intellectual property rights treatments must be found which are, at least, compatible with the 
needs of preservation. Deciding what to preserve and who has the responsibility for it is 
another complex question. The developers of this technology and its users have a mutual 
responsibility to develop feasible guidelines and techniques for the capture, indexing and 
preservation of valuable information. There is much work to be done, much still to be 
discovered and applied. We cannot avoid the avalanche, but only try to speed ahead of it. 


ee eee 
'T was tempted to say “volumes” but that seemed archaic, given the topic! 


ii “Computers.” Encyclopaedia Britannica, Encyclopaedia Britannica, Inc., Fifteenth Edition, 1987. 


Macropaedia. Vol. 16. p. 678 ff 


iti A History of Computing in the Twentieth Century. N. Metropolis, J. Howlett and Gian-Carlo Rota 


(eds.), Academic Press, Inc., Harcourt Brace Jovanovich. 1980. This is an extraordinary volume 
prepared from a collection of essays presented at the International Research Conference on the History 
of Computing held at Los Alamos Scientific Laboratory, 10-15 June, 1976. Legendary names abound in 
the list of authors and in the references. The book includes an extraordinary chapter by Brian Randell 
about the Colossus computers built in England during the Second World War to break the German 


cryptographic codes. 


iv “Transistor,” Encyclopaedia Britannica, Encyclopaedia Britannica, Inc., Fifteenth Edition, 1987, 
Micropaedia, Vol. 11, p. 897. 


V “Integrated Circuit,” Encyclopaedia Britannica, Encyclopaedia Britannica, Inc., Fifteenth Edition, 
1987, Micropaedia, Vol. 6, p. 337. 


vi http://webnuc.nuce.psu.edu/~jhm/20 I/lectures/people. html, Maintained by John Mahaffy : 
jhm@cac.psu.edu 


vii See http://www.isoc.org/internet-history 


